Support llama3 autoparallel + pipelining #1657

wconstab · 2025-08-28T23:28:58Z

so far just tested locally
LOG_RANK=4 CONFIG_FILE=././torchtitan/models/deepseek_v3/train_configs/debug_model.toml ./run_train.sh --model.name llama3_auto_parallel --parallelism.pipeline_parallel_degree 2 --training.steps 100

Runs and loss converges.

Left one TODO about global-batch-size and gradient accumulation

so far just tested locally `LOG_RANK=4 CONFIG_FILE=././torchtitan/models/deepseek_v3/train_configs/debug_model.toml ./run_train.sh --model.name llama3_auto_parallel --parallelism.pipeline_parallel_degree 2 --training.steps 100` Runs and loss converges. Left one TODO about global-batch-size and gradient accumulation

xmfan · 2025-08-29T04:18:42Z

torchtitan/experiments/auto_parallel/parallelize_llama.py


+    pp_degree = job_config.parallelism.pipeline_parallel_degree


unused pp degree config, should probably raise error when its not local world size

i deleted it (it was unused/unneeded). I don't think we need to raise any error. pp_degree does not need to equal any particular size, and pp can even be disabled.

xmfan · 2025-08-29T04:23:33Z

torchtitan/experiments/auto_parallel/parallelize_llama.py

+            spmd_dims.append("tp")
+        spmd_mesh = world_mesh[spmd_dims]
+
+        dp_degree = 1


same, config could specify dp_degree

this is something we could potentially upstream to parallel_dims helper. cc @tianyu-l any reason not to offer the convenience @Property of 'dp_degree' in parallel_dims so you don't have to manually figure out if dp_replicate and/or dp_shard are specified and multiply them?

fegin · 2025-08-29T04:32:10Z

torchtitan/train.py

-                        inputs, target=targets, losses=losses, input_batch=inputs
+                        # TODO: input_batch kwarg only needed for CP, but
+                        # autoparallel doesn't accept kwargs in its forward
+                        inputs, target=targets, losses=losses #, input_batch=inputs


Curious, why does CP need input_batch?

I assumed you would know. Am I wrong?

wconstab · 2025-08-30T15:53:07Z

torchtitan/experiments/auto_parallel/parallelize_llama.py


+    pp_degree = job_config.parallelism.pipeline_parallel_degree
+    local_batch_size = job_config.training.local_batch_size
+    spmd_batch_size = local_batch_size


oops this is a bug for the non-pp case. should be local *dp degree and put in an 'else' branch

ezyang · 2025-09-02T03:28:21Z

torchtitan/train.py

@@ -492,11 +492,13 @@ def forward_backward_step(
                )
                if self.pp_has_first_stage:
                    self.pp_schedule.step(
-                        inputs, target=targets, losses=losses, input_batch=inputs
+                        # TODO: input_batch kwarg only needed for CP, but
+                        # autoparallel doesn't accept kwargs in its forward


Can we just fix this LOL

ezyang · 2025-09-02T03:31:15Z

torchtitan/experiments/auto_parallel/parallelize_llama.py

+        #     # step.
+        #     dp_degree = parallel_dims.dp_replicate * parallel_dims.dp_shard
+        #     global_batch_size = job_config.training.local_batch_size * dp_degree
+        if parallel_dims.pp_enabled and pp_rank > 0:


What a mess. No action here needed, but it's definitely worth thinking about what the terminal UX state here should be.

wconstab requested review from tianyu-l, fegin and wwwjn as code owners August 28, 2025 23:28

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 28, 2025

wconstab requested review from fmassa, ezyang, sanketpurandare, bdhirsh and xmfan and removed request for fegin, wwwjn and tianyu-l August 28, 2025 23:29

xmfan reviewed Aug 29, 2025

View reviewed changes

xmfan approved these changes Aug 29, 2025

View reviewed changes

fegin reviewed Aug 29, 2025

View reviewed changes

wconstab commented Aug 30, 2025

View reviewed changes

ezyang reviewed Sep 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support llama3 autoparallel + pipelining #1657

Support llama3 autoparallel + pipelining #1657

wconstab commented Aug 28, 2025

Uh oh!

xmfan Aug 29, 2025

Uh oh!

wconstab Sep 2, 2025

Uh oh!

xmfan Aug 29, 2025

Uh oh!

wconstab Sep 2, 2025

Uh oh!

fegin Aug 29, 2025

Uh oh!

wconstab Aug 30, 2025

Uh oh!

wconstab Aug 30, 2025

Uh oh!

wconstab Sep 2, 2025

Uh oh!

ezyang Sep 2, 2025

Uh oh!

ezyang Sep 2, 2025

Uh oh!

Uh oh!

Support llama3 autoparallel + pipelining #1657

Are you sure you want to change the base?

Support llama3 autoparallel + pipelining #1657

Conversation

wconstab commented Aug 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!